Data Type


Data Types in Hive specifies the column/field type in the Hive table.  It specifies the type of values that can be inserted into the specified column. Hive supports different data types to be used in table columns. The data types supported by Hive can be broadly classified in Primitive and Complex data types.

Hive Data Types



The primitive data types supported by Hive are listed below:


Numeric Types

  • TINYINT (1-byte signed integer, from -128 to 127)
  • SMALLINT (2-byte signed integer, from -32,768 to 32,767)
  • INT (4-byte signed integer, from -2,147,483,648 to 2,147,483,647)
  • BIGINT (8-byte signed integer, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
  • FLOAT (4-byte single precision floating point number)
  • DOUBLE (8-byte double precision floating point number)
  • DECIMAL (Hive 0.13.0 introduced user definable precision and scale)
Date/Time Types
  • TIMESTAMP
  • DATE

String Types

  • STRING
  • VARCHAR
  • CHAR

Mics Types

  • BOOLEAN
  • BINARY
Apart from these primitive data types Hive offers some complex data types which are listed below:

Complex Types

  • arrays: ARRAY<data_type>
  • maps: MAP<primitive_type, data_type>
  • structs: STRUCT<col_name : data_type [COMMENT col_comment], ...>
  • union: UNIONTYPE<data_type, data_type, ...>
Complex Types can be built up from primitive types and other composite types. Data type of the fields in the collection are specified using an angled bracket notation. Currently Hive supports four complex data types. They are:

ARRAY – An Ordered sequences of similar type elements that are indexable using zero-based integers. It is similar to arrays in Java.
Example – array (‘siva’, ‘bala’, ‘praveen’); Second element is accessed with array[1].

MAP – Collection of key-value pairs. Fields are accessed using array notation of keys (e.g., [‘key’]).
Example – ‘first’ -> ‘bala’ , ‘last’ -> ‘PG’ is represented as map(‘first’, ‘bala’, ‘last’, ‘PG’). Now ‘bala ‘ can be accessed with map[‘first’].

STRUCT – It is similar to STRUCT in C language. It is a record type which encapsulates a set of named fields that can be any primitive data type. Elements in STRUCT type are accessed using the DOT (.) notation.
Example – For a column c of type STRUCT {a INT; b INT} the a field is accessed by the expression c.a

UNIONTYPE – It is similar to Unions in C. At any point of time, an Union Type can hold any one (exactly one) data type from its specified data types.

No comments:

Post a Comment